(12) INTERNATIONAL APPLICATION 



PUBLISHED 



UNDER THE PATENT COOPERATION TREATY (PCT) 



(19) World InteUectual Property 
Organization 
IntemationaJ Bureau 




(43) international Publication Date (10) International Publication Number 

3June2004(03,06.2004) PCT WO 2004/047430 Al 



(51) International Patent Classification^: H04N 5/445, 

5/278 

(21) International Application Numlier: 

PCT/EP2003/012261 

(22) International Filing Date: 

3 November 2003 (03. 1 1 .2003) 



(25) Filing Language: 

(26) Publication Language: 



English 
English 



(30) Priority Data: 
02025474.4 



15 November 2002 (15.1 1.2002) EP 



(71) Applicant (for all designated States except US)i THOM- 
SON LICENSING S.A. [FR/FR]; 46 Quai A. le Gallo, 
F-92 100 Boulogne-Billancourt (FR). 

(72) Inventors; and . > 
(75) Inventors/AppUcante (for US only): ADOLPH, Dirk 

[DE/DE]; Wallbrink 2. 30952 Ronnenberg (DE). HOREN- 
TRUP, Jobst [DE/DE]; Vossstr. 35, 30161 Hannover 
(DE). OSTERMANN, Ralf [DEH^E]; Oberstr. 17, 30167 
Hannover (DE). PETERS, Hartmul [DE/DE]; Ohweg 
34, 30890 Barsinghauscn (DE). SCHILLER, Harald 
[DE/DE]; Apfelgarten 11, 30539 Hannover (DE). 



(74) Agent: RITTNER, Karsten; Deutsche Thomson-Brandt 
GmbH, European Patent Operations, Karl-Wiecheit-Allee 
74, 30625 Hannover (DE). 

(81) Designated States (national): AE, AG, AL, AM, AT. AU, 
AZ, BA, BB, BG, BR, BW, BY, BZ, CA, CH, CN, CO, CR, 
CU, CZ, DE. DK, DM, DZ, EC, EE, EG, ES, FX, GB, GD. 
GE, GH, GM, HR. HU, TD, TL, IN, IS, IP. KE. KG, KP, KR, 
KZ, LC, LK, LR, LS, LT, LU, LV, MA, MD. MG, MK, MN. 
MW, MX, MZ, NI, NO, NZ. OM, PG, PH, PL, PT, RO. RU, 
SC. SD, SB. SG. SK, SL, SY. TJ, TM. TO, TR. TT. TZ, UA, 
UG, US. UZ. VC, VN, YU, ZA, ZM, ZW. 

(84) Designated States (regional): ARIPO patent (GH, GM, 
ICE, LS, MW, MZ, SD, SL, SZ, TZ, UG, ZM. ZW), 
Eurasian patent (AM, AZ, BY, KG, KZ, MD, RU, TJ, TM), 
European patent (AT, BE, BG. CH, C Y, CZ. DE, DK, EE, 
ES, FI, FR, GB. GR, HU, IE. IT, LU, MC. NL, PT. RO, 
SE. SI, SK. TR). OAPI patent (BF. BJ, CF, CG, CI, CM, 
GA, GN, GQ. GW. ML. MR. NE. SN. TD. TG). 

Published: 

— with international search report 

For two-letter codes and other abbreviations, refer to the "Guid- 
ance Notes on Codes and Abbreviations" appearing at the begin- 
ning of each regular issue of the PCT Gazette, 



(54) Title: METHOD AND APPARATUS FOR COMPOSITION OF SUBTITLES 



SCHA 



= SVGA 



help, I need sbmebody . ♦ . 
Please, please! help me j ^ 



SCH 



sew 



i 



(57) Abstract: The gist of the invention is a subtitling format encompass-ing elements of enhanced syntax and semantic to provide 
im-proved animation capabilities. 'Ilie disclosed elements im-prove subtitle performance without stressing the available subtitle 
bitrate. This will become essential for authoring content of high-end HDTV subtitles in pre-recorded format, which can be broadcast 
or stored on high capacity optical media, e.g. the Blue-ray Disc. The invention includes abili-ties for improved authoring possibilities 
for the content production to animate subtitles. For subtitles that are separate from AV material, the method includes using one or 
more superimposed subtitle layers, and displaying only a se-lected part of the transferred subtitles at a time. Further, colors of a 
selected part of the displayed subtitles may be modified, e.g. highlighted. 
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Method and Apparatus for coiq)ositlon of subtitles 



The invention relates to a method and to an apparatus for 
5 composition of subtitles for audio/video presentations, 
which can be used e.g. for HDTV subtitles in pre-recorded 
formats like the so-called Blue-ray Disc. 

10 Background 



The technique of subtitling for Audio-Visual (AV) material 
has been used beginning with the first celluloid cinema 
movies and further until the recent digital media appeared. 

15 The main target of subtitling has been the support of 

handicapped people or small ethnographic language groups. 
Therefore subtitling often aims at the presentation of text 
information even when having been encoded as graphic data 
like pixel maps. Therefore pre-produced AV material for 

20 broadcasting ( Closed "Capticjfn, Teletext, DVB-Subtitle etc.) 
and movie discs (DVD Sub-Picture etc.) primarily are opti- 
mized for subtitles representing simple static textual in- 
formation. However, progress in PC software development for 
presentation and animation of textual information induces a 

25 corresponding demand for possibilities and features within 
the digital subtitling technique used for pre-recording and 
broadcasting. Using straightforward approaches without any 
special precautions/ these increased requirements for subti- 
tling would consume a too big portion of the limited overall 

30 bandwidth. The conflicting reqiiirements for a *full feature' 
subtitle encompassing karaoke all through genuine animations 
are on one hand the coding efficiency and on the other hand 
the full control for any subtitle author. 

35 For today's state of the airt of digitally subtitling AV ma- ■ 
terial with separate subtitling ^information two main ap- 
proaches exist: Subtitling "cran be based on either pixel data 
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or on character data. In both cases, subtitling schemes com- 
prise a general framework, which for instance deals with the 
synchronization of subtitling elements along the AV..time 
axis . 

5 

Character data based subtitling: 

In the character-based s\abtitling approach, e.g. in the 
teletext system ETS 300 706 of European analog or digital 
TV, strings are described by sequences of letter codes, e.g. 

10 ASCII or UNICODE, which intrinsically allows for a very ef- 
ficient encoding. But from character strings alone, subti- 
tling cannot be converted into a graphical representation to 
be overlaid over video. For this, the intended character 
set, font and some font parameters, most notably the font 

15 size, must either be coded explicitly within the subtitling 
bitstream or an implicit assumption must be made about them 
within a suitably defined subtitling context. Also, any sub- 
titling in this approach is confined to what can be ex- 
pressed with the letters and symbols of the specific font (s) 

20 in use. The DVB Subtitling specification ETS 3 00 743, in its 
mode of "character objects", constitutes another state-of- 
the-art example of character-based subtitling. 

Pixel data based subtitling: . 

25 In the pixel -based subtitling approach, subtitling frames 
are conveyed directly in the form of graphical representa- 
tions by describing them as (typically rectangular) regions 
of pixel values on the AV screen. Whenever anything is meant 
to be visible in the subtitling plane superimposed onto 

30 video, its pixel values must be encoded and provided in the 
subtitling bitstream, together with appropriate synchroniza- 
tion info, and hence for the full feature animation of sub- 
titles all pixel changed must be transported. Obviously, 
when removing any limitations inherent with full feature an- 

35 imations of teletext, the pixel -based approach carries the 
penalty of a considerably increased bandwidth for the subti- 
tling data. Examples of pixel-based subtitling schemes can 
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be found in DVD's sub-picture concept "DVD Specification for 
Read-only disc", Part 3: Video, .as well as in the "pixel ob- 
ject" concept of DVB Subtitling, specified in ETS 3Q0 743. 

5 

Invention 

The gist of the invention is a subtitling format encompass- 
ing elements of enhanced syntax and semantic to provide im- 

10 proved animation capabilities. The disclosed elements im- 
prove sxibtitle performance without stressing the available 
subtitle bitrate. This will become essential for authoring 
content of high-end HDTV subtitles in pre-recorded format, 
which can be broadcast or pressed on high capacity optical 

15 media, e.g. the Blue -ray Disc. The invention includes abili- 
ties for improved authoring possibilities for the content 
production to animate subtitles. 

Introduced by the disclosure are elements of syntax and se- 
20 mantic describing the color change for parts of graphics to 
display. This can be used for highlight effects in applica- 
tions like for example karaoke, avoiding the repeated trans- 
fer of pixel data. . ^ 

') 

25 Other disclosed elements of ^syntax and semantic facilitate 
the ability of cropping parts of the subtitles before dis- 
playing them. By using the technique of subsequently trans- 
ferred cropping parameters for an object to display, a bit 
saving animation of subtitles becomes available. Such crop- 

30 ping parameter can be used for example to generate text 

changes by wiping boxes, blinds, scrolling, wipes, checker 
boxes, etc. 

Furthermore the disclosed elements can be used to provide 
35 interactivity on textual and graphical information. Espe- 
cially the positioning and/or color settings of subtitles 
can be manipulated based upon user request . 
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Drawings 

Exemplary embodiments of the invention are described with 
5 reference to the accompanying drawings and tables, which 
show : 

Fig.l: segment_type values for enhanced PCS and RCS; 
Fig, 2: Enhanced page composition segment; 

10 Fig. 3: Enhanced region composition segments- 
Fig. 4: Example for the definition of a subtitle region and 
its location within a pages- 
Fig. 5: Example for definition of a region sub-CLUT and re- 
gion cropping; 

15 Fig. 6: Resulting display example; 

Fig. 7; Interactive usage of subtitles; 

Fig. 8: Video and Graphics Planes; 

Fig. 9: Video and Graphics Mixing and Switching. 

20 

Exemplary embodiments 

The invention can preferably be embodied based on the syntax 
and semantic of the DVB subtitle specification (DVB-ST) . 
25 To provide improved capabilities for the manipulation of 
graphic subtitle elements, the semantics of DVB-ST's page 
composition segment (PCS) and region composition segment 
(RCS) are expanded. 

30 DVB_ST uses page composition segments (PCS) to describe the 
positions of one or more rectangular regions on the display 
screen- The region composition segments (RCS) are used to 
define the size of any such rectangular area and identifies 
the color- lookup- table (CLUT) used within. 

35 

The proposed invention keeps backward compatibility with 
DVB-ST by using different segment_types for the enhanced PCS 
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and RCS elements, as listed in Pig.l showing segment type 
values according to DVB-ST, with additional . values for en- 
hanced PCS and enhanced RCS. It would also be possible to 
choose other values instead. Another approach for keeping 
5 backward compatibility would be to keep the existing seg- 

ment_types and increase the version_nurnber of the specifica- 
tion, e.g. by incrementing the subtitle_stream_id in the 
PES_data_f ield structure. 

10 Fig. 2. shows the data structure of an enhanced page composi- 
tion segment (PCS) , containing a region_cropping section and 
a region_sub_CLUT section. Fig. 3 shows the data structure of 
an enhanced region composition segment (RCS) , containing an 
identifier sub_CLUT_id for a sub-color-look-up- table . With 

15 respect to original DVB-ST, all structures shown are ex- 
panded. In the tables the additional entries are lines 15-28 
in Fig. 2 and line 16 in Fig. 3. ^ 

The enhanced PCS shown in Fig . 2 , carries optional information 
about the region cropping and optional information about the 

20 region_sub-CLUT for every region listed. The two values of 
region_cropping and region_sub_CLUT indicate if such op- 
tional information is available for the current region in 
process. Therefore cropping and sub-CLUT may be defined 
separately for every region. While region_cropping is used 

25 as a flag, as indicated by ^^if region_cropping==0x01" , the 
region_sub_CLiUT shows the value : how many sub-CLUT positions 
are described. This is done to provide different alterna- 
tives within the stream. Alternative sub-CLUT positions can 
be used to define different menu button positions for the 

30 display screen. Only one of them - the first one as a de- 
fault - is active and the user -can change the position to 
navigate through the different predefined positions pressing 
the remote for example . - 

35 The enhanced RCS shown in Fig. 3 carries the sub_CLUT_id 

identifying the family of CLUTs that applies to this region. 
This is done to re-use CLUTs for different regions and dif- 
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ferent region sub_CLUTs as well. 

The enhanced PCS and enhanced RCS elements provide the abil- 
ity that subtitles can be manipulated independent from the 
encoding method i.e. independent from whether they are en- 
5 coded as character data or pixel data. 

The enhanced PCS and RCS can be used to perform many differ- 
ent animation effects for subtitles. Those could be wiping 
boxes, blinds, scrolling, wipes, checker boxes, etc. The 

10 following figures show an application example for karaoke. 
Fig. 4 shows the definition of a region R containing lyrics 
of a song displayed for karaoke. The letters of the subtitle 
may be encoded as pixel data or as character data as well. 
The region_vertical_address RVA .and the 

15 region_horizontal_address RHA define the location of the 
subtitle within the frame, or page PG, to display. 

Fig. 5 depicts in the upper part region cropping, and in the 
lower part the location of the region sub-CLUT, Region crop- 

20 ping defines which part of the region is effectively dis- 
played. This is achieved by four parameters RHC, RVC, RCH, ROW 
indicating the start coordinates and the size of the frag- 
ment to display. region_horizorital_cropping RHC specifies 
the horizontal address of the top left pixel of this crop- 

25 ping, region_vertical_cropping RVC specifies the vertical 
address of the top line of this cropping, . 

region_cropping_width RCW specifies the horizontal length of 
this cropping, and region^'c'ropping^height RCH specifies the 
vertical length of this cropping, wherein cropping means 
30 that part of the subtitles that is visible on a display. 

The region sub-CLUT location shown in the lower part of 
Fig. 5 defines which part of the region has to be displayed 
using a color-look-up- table (CLUT) different from the region 
35 CLUT. This is achieved by four parameters SCHA, SCVA, SCH, SCW • 
indicating the start coordinates and the size of the sub- 
region used by the sub-CLUT. All coordinate parameters are 
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to be understood relative to the region the sub-CLUT belongs 
to. siib_CLUT_horizontal_address SCHA specifies the horizon- 
tal address of the top left pixel of this sub-CLUT, 
sub__CLUT_vertical_address SCVA specifies thie vertical 
5 address of the top line of this sub-CLUT, sub_CLUT_width SCW 
specifies the horizontal length of this sub-CLUT and 
siib_CLUT_height SCH specifies the vertical length of this 
s\ib-CLUT- 

10 Picking up all parameters def ine.d with the previous figures 
results in the displayed subtitle as depicted in Fig. 6. The 
subtitle is not depicted in whole on the display but only 
the cropped part of it. Furthermore the sub-CLUT was used to 
provide a highlight HT, so that the user knows what to sing 

15 in the moment . 



As the enhanced PCS are sent within MPEG packet elementary 
stream (PES) packets labeled by presentation time stamps 
(PTS) , any effect can be synchronized to the AV. 

20 

Another idea of the invention is the superseding of subtitle 
animation parameters by the user. This offers a way to real- 
ize interactive subtitles. The enhanced PCS parameters are 
transferred as a default, and /the user may change them via a 
25 remote control for example. Tlius^ the user is able tOy move, 
crop or highlight the subtitle. 

This could be an advantage for a user defined repositioning 
of a subtitling text, so that the user can subjectively 

30 minimize the annoyance by the subtitle text placement on top 
of the motion video. Also the color of the subtitles could 
be set according to users preferences. Fig. 7 shows a block 
diagram for interactive subtitle ' modifications . The default 
parameters DD read from a disc D are superseded by supersed- 

35 ing data SD being generated upon the user action UA and 
processed by a processor P. 
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Another application for overriding subtitle animation 
parameters like position, cropping rectangle, CLUTs and sub- 
CLUTs is the realization of some very basic sort of ...interac- 
tive gaming. The subtitle may carry pixel data of an ani- 
5 mated character. This character is subsequently moved on the 
display screen driven by either user interaction, program- 
matic control or both. 

The overriding of subtitle animation parameters can be im- 
10 plemented in at least two ways. The first option is that the 
overriding parameters SD replace the parameters DD send in 
the bit stream. The second option is that the overriding pa- 
rameters SD are used as an offset that is added to or sub- 
tracted from the subtitle animation parameters DD send in 
15 the bitstream. 

The enhanced PCS and RCS provide a lot more of animation ca- 
pabilities not explained. Following is a non- exhaustive list 
of examples: wiping boxes, blinds, scrolling, wipes, checker 
boxes in details. 

20 

Exemplary video and graphics planes are shown in Fig. 8 in an 
exemplary, schematic manner. A background is provided by ei- 
ther an MPBG-2 video layer MVL or a still picture layer SPL. 
They are mutually exclusive, which means that not both of 

25 them need to be held in a buffer at a time. The next two 
layers comprise a subtitle layer SL and an AV sync type 
graphics layer AVSGL. These two layers are in this example 
interchangeable, meaning that either the subtitle layer SL 
or the AV sync type graphics layer AVSGL may have priority 

30 over the other. The front layer is a non-AV sync graphics 
layer NAVSGL, containing graphics that need not be synchro- 
nized with the AV content, such as e.g. menus or other on- 
screen displays. The inventive method can preferably be used 
for the subtitle layer SL, the AV sync graphics layer AVSGL 

35 and/or the Non-AV sync graphics layer NAVSGL. 

1 - • J:. 

Fig. 9 shows relevant components "of an apparatus for video 
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and graphics mixing and switching. Data comprising either 
still picture data or MPEG-2 video data, further data for 
subtitles, data for animations and data for non-AV sync 
graphics such as menu buttons, are retrieved from a disc D. 
5 Additionally or alternatively, data for subtitles, anima- 
tions and/or non-AV sync graphics can be received from a 
network NW, e.g. internet. A processing unit CPU processes 
the non-AV sync graphics data and sends the resulting data 
to a rendering device for non-AV sync graphics RNAVG. 

10 

The apparatus contains a still picture decoder SPDec and an 
MPEG-2 video decoder MVDec, but since only one of them is 
used at a time, a switch si can select which data shall be 
used for further processing. Moreover, two identical decod- 
• 15 ers AVSGDecl,AVSGDec2 are used for decoding subtitle and 

animation data. The outputs of these two decoders AVSGDecl, 
AVSGDec2 may be switched by independent switches s2,s3 to 
either a mixer MX, or for preprocessing to a mixer and sca- 
ler MXS, which outputs its resulting data to said mixer MX. 
20 These two units MX, MXS are used -to perform the superimposing 
of its various input data, thus controlling the display or- 
der of the layers. The mixer MX has inputs for a front layer 
f2, a middle front layer mf, a middle back layer mb and a 
background layer b2 . The front layer f2 may be unused, if 
25 the corresponding switch s3 is in a position to connect the 
second AV sync graphics decoder AVSGDec2 to the mixer and 
scaler MXS. This unit MXS has inputs for front layer fl, 
middle layer m and background layer b. It superimposes these 
data correspondingly and sends the resulting picture data to 
30 the background input b2 of the mixer MX. Thus, these data 

represent e.g. a frame comprising up to three layers of pic- 
ture and subtitles, which can be scaled and moved together 
within the final picture. The background input bl of the 
mixer and scaler MXS is connected to the switch si mentioned 
35 above, so that the background can be generated from a still 
picture or an MPEG-2 video. ' * 
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The output of the first AV sync graphics decoder AVSGDecl is 
connected to a second switch s2, which may switch it to the 
middle layer input m of the miker and scaler MXS or ..to the 
middle back layer input mb of the mixer MX. The output of 
5 the second AV sync graphics decoder AVSGDec2 is connected to 
a third switch a3 , which may switch it to the front layer 
input f 1 of the mixer and scaler MXS or to the middle front 
layer input mf of the mixer MX. 

10 Depending on the positions of the second and third switch 

s2,s3, either the output of the first or the second AV sync 
graphics decoder AVSGDecl ,AVSGD 2 may have priority over the 
other, as described above. For having the data from the 
first decoder AVSGDecl in the foreground, the second switch 

15 s2 may route the subtitle data to the middle back input mb 
of the mixer MX, while the third switch s3 routes the anima- 
tion graphics data to the'^f ront input fl of the mixer and 
scaler MXS, so that it ends up 'at the background input b2 of 
the mixer MX. Otherwise, foif haying the data from the second 

20 decoder AVSGDec2 in the foreground, the switches s2,s3 may 
route their outputs to the same unit, either the mixer and 
scaler MXS or the mixer MX, as shown in Fig. 9. 
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Claims 



1. Method for composition of siabtitles for audio/video 
presentations, wherein subtitle information is separate 

5 from audio/video material, and subtitle information is 

transferred from a network or a storage medium, such as 
a disc, characterized in 

- using one or more sxabtitle layers; and 

- cropping parts of the s\ibtitles of a layer or layers 
10 before displaying them, so that only a selected 

(RHC,RVC,RCH,RCW) part of the transferred subtitles 
is displayed at a time. 

2. Method according to claim 1, wherein the colors of a 

15 specified (SCHA, SCVA, SCH, SCW) part of the subtitles may 

be modified. 

3. Method according to claim 1 or 2 , wherein subtitles may 
be interactively moved, cropped or highlighted, or the 

20 colors of subtitles be interactively modified by a 

user . 

4. Method according to any of the previous claims, wherein 
the subtitles may contain graphics. 

25 

5. Method according to any of the previous claims, wherein 
the AV material and the subtitles comply with the 

DVB -ST standard. 

30 6. Apparatus for composition of subtitles, the apparatus 

mixing and switching video and graphics data, the data 
being read from a storage medium or received from a 
network and comprising ""St ill picture data or MPEG video 
data, data for at least two layers of subtitles or an- 

35 imations, and optionally data for non- synchronized 

graphics, the apparatus comprising 

- a mixer (MX) that may superimpose video data of a 
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back layer, at least two middle layers and a front 
layer; 

- a mixer and scaler (MXSy that may superimpose video 
data of a back layer, a middle layer and a front 
layer, the mixer and scaler (MXS) providing its out- 
put data to the mixer (MX) ; 

- a video decoder (MVDec) and/or a . still picture de- 
coder (SPDec) , wherein the output data of either the 
video decoder or the still picture decoder may be 
switched (si) to the mixer and scaler (MXS) ; 

- at least two simultemeously working decoders 
(AVSGDecl, AVSGDec2) for synchronized graphics or 
subtitles, wherein the output of each of the decoders 
may be switched (s2,s3) to either the mixer (MX) or 
the mixer and scaler (MXS) , and wherein a decoder 
(AVSGDecl,AVSGDec2) , may select a part 

(RHC,RVC,RCH,RCW) of ita input data to be output for 
display; 

- a renderer for the non- synchronized graphics, provid- 
ing data to the mixer (MX) . 

7 . Apparatus according to claim 6 , wherein a decoder 
(AVSGDecl, AVSGDec2) may apply a different color-look-up 
table to a specified (SCHA, SCVA, SCH, SCW) part of a sub- 
title layer. 

♦ 

8. Apparatus according to claim 6 or 7, comprising a sub- 
title decoder (ST-DEC) that is capable of superseding 
default subtitle parameters (DD) with other subtitle 
parameters (SD) generated upon user action, for inter- 
actively modifying or. highlighting subtitles. 

9. Apparatus according to'any.of claims 6-8, wherein the 
data comply with the DVB- ST standard. 
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